-2- 



Application Serial No. 10/559,573 
Attorney Docket No. 0670-7063 



The listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 

1. (Currently Amended) A voice data selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

search means for inputting text information expressing a text and retrieving voice 
data expressing a waveform of a voice unit whose reading is common to that of a voice 
unit which constitutes the text from among the voice data; and 

selection means for selecting each one of voice data corresponding to each 
voice unit which constitutes the text from among the searched voice data so that a value 
obtained by totaling difference of pitches in boundaries of adjacent voice units in the 
whole text may become minimum,. 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually; and 

lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
which the memory means stores, and in that the speech synthesis means generates 
data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes . 

2. (Canceled) 

3. (Currently Amended) A voice data selection method, the method comprising 
the steps of: 

storing a plurality of voice data expressing voice waveforms; 
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inputting text information expressing a text, retrieving voice data expressing a 
waveform of a voice unit whose reading is common to that of a voice unit which 
constitutes the text from among the voice data; and 

selecting each one of voice data corresponding to each voice unit which 
constitutes the text from among the retrieved voice data so that a value obtained by 
totaling difference of pitches in boundaries of adjacent voice units in the whole text may 
become minimum., 

generating data expressing synthetic speech by combining selected voice data 
mutually; and 

synthesizing voice data expressing a waveform of a voice unit in regard to the 
voice unit, on which voice data was not able to be selected, among voice units in the 
text without using the stored voice data, and in that generating data expressing 
synthetic speech by combining the selected voice data with synthesized voice data . 

4. (Canceled) 

5. (Currently Amended) A voice selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

prediction means for predicting time series change of pitch of a voice unit by 
inputting text information expressing a text and performing cadence prediction for a 
voice unit which constitutes the text concerned; and 

selection means for selecting from among the voice data the voice data which 
expresses a waveform of a voice unit whose reading is common to that of a voice unit 
which constitutes the text, and whose time series change of pitch has the highest 
correlation with prediction results by the prediction means,, 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually; and 
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lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
which the memory means stores, and in that the speech synthesis means generates 
data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes . 

6. (Previously Presented) The voice selector according to claim 5, wherein the 
selection means may specify strength of correlation between time series change of pitch 
of voice data, and results of prediction by the prediction means on the basis of results of 
regression calculation which performs primary regression between time series change 
of pitch of a voice unit which voice data expresses, and time series change of pitch of a 
voice unit in the text whose reading is common to the voice unit concerned. 

7. (Previously Presented) The voice selector according to claim 5, wherein the 
selection means may specify strength of correlation between time series change of pitch 
of voice data, and results of prediction by the prediction means on the basis of a 
correlation coefficient between time series change of pitch of a voice unit which voice 
data expresses, and time series change of pitch of a voice unit in the text whose 
reading is common to the voice unit concerned. 

8. (Currently Amended) A voice selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

prediction means for predicting time length voice unit and time series change of 
pitch of the voice unit concerned by inputting text information expressing a text and 
performing cadence prediction for the voice unit in the text concerned; and 

selection means for specifying an evaluation value of each voice data expressing 
a waveform of a voice unit whose reading is common to a voice unit in the text and 
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selecting voice data whose evaluation value expresses the highest evaluation, and in 
that the evaluation value is obtained from a function of a numerical value which 
expresses correlation between time series change of pitch of a voice unit which voice 
data expresses, and prediction results of time series change of pitch of a voice unit in 
the text whose reading is common to the voice unit concerned, and a function of 
difference between prediction results of time length of a voice unit which the voice data 
concerned expresses, and time length of a voice unit in the text whose reading is 
common to the voice unit concerned.,. 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually; and 

lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
which the memory means stores, and in that the speech synthesis means generates 
data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes . 

9. (Original) The voice selector according to claim 8, wherein the numerical 
value expressing correlation comprises a gradient of a primary function obtained by the 
primary regression between time series change of pitch of a voice unit which voice data 
expresses, and time series change of pitch of a voice unit in the text whose reading is 
common to that of the voice unit concerned. 

10. (Original) The voice selector according to claim 8, wherein the numerical 
value expressing correlation comprises an intercept of a primary function obtained by 
the primary regression between time series change of pitch of a voice unit which voice 
data expresses, and time series change of pitch of a voice unit in the text whose 
reading is common to that of the voice unit concerned. 
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1 1 . (Previously Presented) The voice selector according to claim 8, wherein the 
numerical value expressing correlation comprises a correlation coefficient between time 
series change of pitch of a voice unit which voice data expresses, and prediction results 
of time series change of pitch of a voice unit in the text whose reading is common to 
that of the voice unit concerned. 

12. (Previously Presented) The voice selector according to claim 8, wherein the 
numerical value expressing correlation comprises the maximum value of correlation 
coefficients between a function which what is given various bit count cyclic shifts to data 
expressing time series change of pitch of a voice unit which voice data expresses, and 
a function expressing prediction results of time series change of pitch of a voice unit in 
the text whose reading is common to that of the voice unit concerned. 

13. (Original) The voice selector according to any one of claims 5 to 12, wherein 
the memory means stores phonetic data expressing reading of voice data with 
associating it with the voice data concerned; and 

wherein the selection means treats voice data, with which phonetic data 
expressing the reading agreeing with the reading of a voice unit in the text is 
associated, as voice data expressing a waveform of a voice unit whose reading is 
common to the voice unit concerned. 

14. -15. (Canceled) 

16. (Currently Amended) A voice selection method, the method comprising the 
steps of: 

storing a plurality of voice data expressing voice waveforms; 
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predicting time series change of pitch of a voice unit by inputting text information 
expressing a text and performing cadence prediction for a voice unit which constitutes 
the text concerned; and 

selecting from among the voice data the voice data which expresses a waveform 
of a voice unit whose reading is common to that of a voice unit which constitutes the 
text, and whose time series change of pitch has the highest correlation with prediction 
results by the prediction means^ 

generating data expressing synthetic speech by combining selected voice data 
mutually; and 

synthesizing voice data expressing a waveform of a voice unit in regard to the 
voice unit, on which voice data was not able to be selected, among voice units in the 
text without using the stored voice data, and in that generating data expressing 
synthetic speech by combining the selected voice data with synthesized voice data . 

17. (Currently Amended) A voice selection method, the method comprising the 
steps of: 

storing a plurality of voice data expressing voice waveforms; 

predicting time length of voice unit and time series change of pitch of the voice 
unit concerned by inputting text information expressing a text and performing cadence 
prediction for a voice unit in the text concerned; and 

specifying an evaluation value of each voice data expressing a waveform of a 
voice unit whose reading is common to a voice unit in the text and selecting voice data 
whose evaluation value expresses the highest evaluation, and in that the evaluation 
value is obtained from a function of a numerical value which expresses correlation 
between time series change of pitch of a voice unit which voice data expresses, and 
prediction results of time series change of pitch of a voice unit in the text whose reading 
is common to the voice unit concerned, and a function of difference between prediction 
results of time length of a voice unit which the voice data concerned expresses, and 
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time length of a voice unit in the text whose reading is common to the voice unit 
concerned^ 

generating data expressing synthetic speech by combining selected voice data 
mutually; and 

synthesizing voice data expressing a waveform of a voice unit in regard to the 
voice unit, on which voice data was not able to be selected, among voice units in the 
text without using the stored voice data, and in that generating data expressing 
synthetic speech by combining the selected voice data with synthesized voice data . 

18.-19. (Canceled) 

20. (Currently Amended) A voice data selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

text information input means of inputting text information expressing a text; 

a search section for searching voice data which has a portion whose reading is 
common to that of a voice unit in a text which the text information expresses; and 

selection means for obtaining an evaluation value according to predetermined 
evaluation criteria on the basis of relationship between mutually adjacent voice data 
when each of the searched voice data is connected according to the text which text 
information expresses, and selecting combination of voice data, which is outputted, on 
the basis of the evaluation value concerned,. 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually; and 

lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
which the memory means stores, and in that the speech synthesis means generates 
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data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes . 

21. (Original) The voice data selector according to claim 20, wherein the 
evaluation criterion is a criterion which determines an evaluation value which shows 
relationship between mutually adjacent voice data; and 

wherein the evaluation value is obtained on the basis of an evaluation expression 
which contains at least any one of a parameter which shows a feature of voice which 
the voice data expresses, a parameter which shows a feature of voice obtained by 
mutually combining voice which the voice data expresses, and a parameter which 
shows a feature relating to speech time length. 

22. (Original) The voice data selector according to claim 20, wherein the 
evaluation criterion is a criterion which determines an evaluation value which shows 
relationship between mutually adjacent voice data; and that the evaluation value 
includes a parameter which shows a feature of voice obtained by mutually combining 
voice which the voice data expresses, and is obtained on the basis of an evaluation 
expression which contains at least any one of a parameter which shows a feature of 
voice which the voice data expresses, and a parameter which shows a feature relating 
to speech time length. 

23. (Original) The voice data selector according to claim 21 or 22, wherein the 
parameter which shows a feature of voice obtained by mutually combining voice which 
the voice data expresses is obtained on the basis of difference between pitches in a 
boundary of mutually adjacent voice data in the case of selecting at a time one voice 
data corresponding to each voice unit which constitutes the text from among voice data 
which expressing waveforms of voice having a portion whose reading is common to that 
of a voice unit in a text which the text information expresses. 
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24. (Previously Presented) The voice data selector according to any one of 
claims 20 to 22, wherein the evaluation criterion further includes a reference which 
determines an evaluation value which expresses correlation or difference between 
voice, which voice data expresses, and cadence prediction results of the cadence 
prediction means; and that the evaluation value is obtained on the basis of a function of 
a numerical value which expresses correlation between time series change of pitch of a 
voice unit which voice data expresses, and prediction results of time series change of 
pitch of a voice unit in the text whose reading is common to the voice unit concerned, 
and/or a function of difference between prediction results of time length of a voice unit 
which the voice data concerned expresses, and time length of a voice unit in the text 
whose reading is common to the voice unit concerned. 

25. (Original) The voice data selector according to claim 24, wherein the 
numerical value expressing correlation comprises a gradient and/or an intercept of a 
primary function obtained by the primary regression between time series change of 
pitch of a voice unit which voice data expresses, and time series change of pitch of a 
voice unit in the text whose reading is common to that of the voice unit concerned. 

26. (Previously Presented) The voice data selector according to claim 25, 
wherein the numerical value expressing correlation comprises a correlation coefficient 
between time series change of pitch of a voice unit which voice data expresses, and 
prediction results of time series change of pitch of a voice unit in the text whose reading 
is common to that of the voice unit concerned. 

27. (Previously Presented) The voice data selector according to claim 25, 
wherein the numerical value expressing correlation comprises the maximum value of 
correlation coefficients between a function which what is given various bit count cyclic 
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shifts to data expressing time series change of pitch of a voice unit which voice data 
expresses, and a function expressing prediction results of time series change of pitch of 
a voice unit in the text whose reading is common to that of the voice unit concerned. 

28. (Previously Presented) The voice selector according to any one of claims 20 
to 22, wherein the memory means stores phonetic data expressing reading of voice 
data with associating it with the voice data concerned; and 

wherein the selection means treats voice data, with which phonetic data 
expressing reading agreeing with reading of a voice unit in the text is associated, as 
voice data expressing a waveform of a voice unit whose reading is common to the voice 
unit concerned. 

29. (Canceled) 

30. (Original) The voice data selector according to claim 29, comprising: 
lacked portion synthesis means for synthesizing voice data expressing a 

waveform of a voice unit in regard to a voice unit, on which the selection means is not 
able to select voice data, among voice units in the text without using voice data which 
the memory means stores, and in that the speech synthesis means generates data 
expressing synthetic speech by combining a voice data, which the selection means 
selects, with voice data which the lacked portion synthesis means synthesizes. 

31. (Currently Amended) A voice data selection method, the method comprising 
the steps of: 

storing a plurality of voice data expressing voice waveforms; 
inputting text information expressing a text; 

searching voice data which has a portion whose reading is common to that of a 
voice unit in a text which the text information expresses; 
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obtaining an evaluation value according to predetermined evaluation criteria on 
the basis of relationship between mutually adjacent voice data when each of the 
searched voice data is connected according to a text which text information expresses; 
and 

selecting combination of voice data, which is outputted, on the basis of the 
evaluation value concerned,, 

generating data expressing synthetic speech by combining selected voice data 
mutually; and 

synthesizing voice data expressing a waveform of a voice unit in regard to the 
voice unit, on which voice data was not able to be selected, among voice units in the 
text without using the stored voice data, and in that generating data expressing 
synthetic speech by combining the selected voice data with synthesized voice data . 

32. (Currently Amended) A program for causing a computer to function as: 

memory means for storing a plurality of voice data expressing voice waveforms; 

text information input means for inputting text information expressing a text; 

a search section for searching voice data which has a portion whose reading is 
common to that of a voice unit in a text which the text information expresses; and 

selection means for obtaining an evaluation value according to a predetermined 
evaluation criterion on the basis of relationship between mutually adjacent voice data 
when each of the searched voice data is connected according to a text which text 
information expresses, and selecting combination of voice data, which is outputted, on 
the basis of the evaluation value concerned;, 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually; and 

lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
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which the memory means stores, and in that the speech synthesis means generates 
data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes . 



